This homework covers the linear models and diagnostics & transforms lectures. All work must be submitted electronically. For full credit you must show all of your steps. Use of computational tools (e.g., R) is encouraged; and when you do, code inputs and outputs must be shown in-line (not as an appendix) and be accompanied by plain English that briefly explains what the code is doing.
Let \(Y_1, \dots, Y_n\) be independent Poisson random variables with mean \(\theta\).
y <- c(3, 5, 6, 5, 2, 6, 6, 7, 8, 8, 7, 8, 0, 5, 7, 6, 6, 10, 6, 5, 6, 7, 6,
9, 8, 4, 8, 7, 11, 9, 4, 4, 7, 9, 8, 6, 5, 6, 12, 10, 7, 13, 8, 12, 9, 4,
10, 8, 4, 5)
pois <-function(y, theta){
summ <- 0
for(x in y){
one <- (theta**x*exp(-theta))/factorial(x)
summ <-summ + log(one)
}
-summ
}
pois(y, 5)
## [1] 133.5273
pois(y, 6)
## [1] 121.1733
pois(y, 7)
## [1] 118.4538
Revisit the tractor data in tractor.csv from Homework 1. If necessary, first re-calcuate your estimated coefficients from the least squares fit to the data.
tractor <- read.csv("tractor.csv")
attach(tractor)
summ = summary(reg <- lm(cost ~ age))
s <- summary(reg)$sigma**2
n <- length(cost)
#variance:
s
## [1] 96277.21
c <-confint(reg, level=0.95)
#conf int for 3 yrs
c(3*c[2]+c[1], 3*c[4]+c[3])
## [1] 182.1401 1330.0605
xf <- data.frame(age=c(seq(1, 10)))
pred <- predict(reg, newdata=xf, interval="confidence")
plot(age, cost, xlim = c(1,10), ylim = c(0, 2200))
lines(pred[,1])
lines(pred[,2], col="red")
lines(pred[,3], col="red")
In this problem we will add mathematical heft to the concept of the leverage of a data point.
(15 pts) Show that the least squares fitted values \(\hat{y}_i\), for \(i=1,\dots, n\), can be written as a linear combination of all observed \(y_j\) values, for \(j=1,\dots,n\). That is, show that we can write \[
\hat{y}_i = \sum_{j=1}^n h_{ij} y_j \quad \mbox{ for } \quad i=1, \dots, n.
\] Hint: \(h_{ii} = h_i\), the leverage of the \(i^{\mathrm{th}}\) data point from lecture.
(5 pts) Now, show that the derivative of \(\hat{y}_i\) with respect to \(y_i\) is the leverage \(h_i\). I.e., \(\frac{d \hat{y}_i}{d y_i} = h_i\). ### Problem 4: Transforms (20 pts)
The file transforms.csv contains 4 pairs of \(x\)s and \(y\)s. For each pair:
transforms <- read.csv("~/Documents/Documents/Machine Learning/transforms.csv", header = TRUE)
plot(transforms$X1, transforms$Y1, xlab = "X1", ylab = "Y1", col= 1); abline(lm(transforms$Y1 ~ transforms$X1), col = 1)
plot(transforms$X2, transforms$Y2, xlab = "X2", ylab = "Y2", col=2); abline(lm(transforms$Y2 ~ transforms$X2), col = 2)
plot(transforms$X3, transforms$Y3, xlab = "X3", ylab = "Y3", col=3); abline(lm(transforms$Y3 ~ transforms$X3), col = 3)
plot(transforms$X4, transforms$Y4, xlab = "X4", ylab = "Y4", col=4); abline(lm(transforms$Y4 ~ transforms$X4), col = 4)
reg1 <- lm(transforms$Y1 ~ transforms$X1)
hist(rstudent(reg1), main="", col = 1); qqnorm(rstudent(reg1), col=1, main=""); plot(reg1$fitted, rstudent(reg1), ylab="student residuals", xlab = "fitted", col=1)
reg2 <- lm(transforms$Y2 ~ transforms$X2)
hist(rstudent(reg2), main="", col = 2); qqnorm(rstudent(reg2), col=2, main=""); plot(reg2$fitted, rstudent(reg2), ylab="student residuals", xlab = "fitted", col=2)
reg3 <- lm(transforms$Y3 ~ transforms$X3)
hist(rstudent(reg3), main="", col = 3); qqnorm(rstudent(reg3), col=3, main=""); plot(reg3$fitted, rstudent(reg3), ylab="student residuals", xlab = "fitted", col=3)
reg4 <- lm(transforms$Y4 ~ transforms$X4)
hist(rstudent(reg4), main="", col = 4); qqnorm(rstudent(reg4), col=4, main=""); plot(reg4$fitted, rstudent(reg4), ylab="student residuals", xlab = "fitted", col=4)
lreg <- lm(log(transforms$Y1) ~ transforms$X1);
plot(transforms$X1, log(transforms$Y1)); abline(lreg, col="red")
X2_2 <- transforms$X2^2
nlreg <- lm(transforms$Y2~ transforms$X2 + X2_2)
xgrid = data.frame(transforms$X2, transforms$X2^2)
plot(transforms$X2, transforms$Y2, pch=20, col=4)
pred = predict(nlreg, newdata=xgrid)
lines(xgrid$transforms.X2, pred, col=2, lty=2)
X3_3 <- transforms$X3^2
logreg <- lm(transforms$Y3 ~ log(transforms$X3) + X3_3);
plot(log(transforms$X3), log(transforms$Y3))
## Warning in log(transforms$Y3): NaNs produced
xgrid2 = data.frame(transforms$X3, transforms$X3^2)
pred = predict(logreg, newdata=xgrid2)
lines(xgrid2$transforms.X3, pred, col=2, lty=2)
lreg2 <- lm(log(transforms$Y4) ~ log(transforms$X4));
plot(log(transforms$X4), log(transforms$Y4)); abline(lreg2, col="red")
plot(lreg$fitted, rstudent(lreg)); abline(h=0, col=1, lty=2)
plot(nlreg$fitted, rstudent(nlreg)); abline(h=0, col=1, lty=2)
plot(logreg$fitted, rstudent(logreg)); abline(h=0, col=1, lty=2)
plot(lreg2$fitted, rstudent(lreg2)); abline(h=0, col=1, lty=2)
Each of i.–v. for each of four pairs will be assigned 1 point.
Data were collected on the average Sunday and daily (i.e., weekday) circulations (in thousands) for 48 of the top 50 newspapers in the United States for the period March–September, 1993. See newspaper.csv.
newspaper <- read.csv("~/Documents/Documents/Machine Learning/newspaper.csv")
attach(newspaper)
summ = summary(reg <- lm(Sunday ~ daily))
xf <- data.frame(daily=c(seq(1, 2000)))
pred <- predict(reg, newdata=xf, interval="prediction")
plot(daily, Sunday)
lines(pred[,1])
lines(pred[,2], col="red")
lines(pred[,3], col="red")
reg <- lm(Sunday ~ daily)
confint(reg, level= 0.95)
## 2.5 % 97.5 %
## (Intercept) -4.344248 156.363861
## daily 1.125645 1.427701
zb1 <- (1.27667 - 1)/0.07503
2*pt(-abs(zb1), df=46)
## [1] 0.0005965783
Hypothesis: Daily circulation is a predictor of amount of Sunday circulation. Since the b1 z value is < .05, this means that yes, the daily circulation is a predictor.
The logarithm is used because the data is mostly centered at the lower end of the lower end of the x values.
reg1 <- lm(log(Sunday) ~ log(daily))
xf <- data.frame(daily=c(seq(1, 2000)))
pred1 <- predict(reg1, newdata=xf, interval="prediction")
plot(log(daily), log(Sunday))
lines(pred1[,1])
lines(pred1[,2], col="red")
lines(pred1[,3], col="red")
reg2 <- lm(log(Sunday) ~ log(daily))
confint(reg2, level= 0.95)
## 2.5 % 97.5 %
## (Intercept) -0.2432072 1.49705
## log(daily) 0.8096909 1.10108
zb1 <- (1.27667 - 1)/0.07503
2*pt(-abs(zb1), df=46)
## [1] 0.0005965783
Hypothesis: Daily circulation is a predictor of amount of Sunday circulation. Since the b1 z value is < .05, this means that yes, the daily circulation is a predictor. Sunday increses by 1% for every .95% change in daily.